Skip to content

feat(waterdata): get_queryables + queryables monitor + passthrough enablement#333

Merged
thodson-usgs merged 6 commits into
DOI-USGS:mainfrom
thodson-usgs:feat/waterdata-queryables
Jul 1, 2026
Merged

feat(waterdata): get_queryables + queryables monitor + passthrough enablement#333
thodson-usgs merged 6 commits into
DOI-USGS:mainfrom
thodson-usgs:feat/waterdata-queryables

Conversation

@thodson-usgs

Copy link
Copy Markdown
Collaborator

Draft — held until the upstream API is ready. Several of the newly
enabled queryables are accepted by the service but don't yet round-trip in
the data (e.g. filtering get_daily(state_name="Hawaii") returns rows with no
state_name column, and not all attributes filter reliably yet). The
get_queryables helper and the monitoring test are ready now; the
queryable-enablement waits on the upstream data side.

Summary

Three related changes around the Water Data OGC API's queryables (the
properties each collection can be filtered on):

  1. waterdata.get_queryables(collection) — returns a collection's queryable
    properties as a tidy (DataFrame, BaseMetadata), one row per property with
    its type, title, and description. Lets callers discover the available
    filters programmatically.

  2. A live monitoring testtests/waterdata_queryables_test.py compares
    each collection's advertised queryables against a committed snapshot
    (tests/data/waterdata_queryables.json, 489 properties across 11
    collections). It fails when the upstream API adds / removes / renames a
    queryable — the signal to regenerate the snapshot and enable anything new.

  3. Passthrough enablement — the OGC data getters exposed ~11 of each
    collection's ~50 queryables as named params; the rest (mostly the shared
    monitoring-location attributes — state_name, county_code, site_type,
    altitude, …, now filterable on the data endpoints) were reachable only via
    the raw filter CQL. Each OGC getter now accepts **queryables, so any
    queryable can be passed as a filter:

    # filter daily discharge by a monitoring-location attribute
    df, md = waterdata.get_daily(parameter_code="00060", state_name="Wisconsin")

How the passthrough works

get_daily, get_continuous, get_latest_continuous, get_latest_daily,
get_field_measurements, get_field_measurements_metadata, get_peaks,
get_channel, get_monitoring_locations, get_time_series_metadata, and
get_combined_metadata each gain **queryables. The shared
waterdata.utils._get_args flattens that kwargs dict into the request args, so a
passthrough filter is normalized (iterables → comma-joined, etc.) and sent
exactly like a named param. get_cql (the raw-CQL escape hatch) is intentionally
excluded.

No client-side queryable list is bundled: the service validates names itself —
an unknown queryable returns HTTP 400, surfaced as the typed
DataRetrievalError. (The committed snapshot is used only by the monitoring
test, not for runtime validation, so it can't drift the package.)

Provisional — passthrough now, explicit named params later?

This PR uses a passthrough (**queryables). That decision is deliberate but
not final:

Why passthrough now

  • Compact: avoids adding ~40 near-identical params to each of 11 getters (a
    ~400-param explosion of mostly-shared location attributes).
  • Auto-tracks the API: when upstream adds a queryable, the monitoring test flags
    it and it's already usable — no per-getter code change to expose it.
  • DRY: one _get_args change enables every getter uniformly.

Why we may switch to explicit named params

  • Discoverability: explicit params show up in IDE autocomplete and
    help(get_daily); **queryables hides them.
  • Per-param docs & types: each queryable could carry its own description and
    type hint instead of one generic note.
  • Typo safety: a misspelled explicit param is a TypeError at the call site;
    a misspelled passthrough queryable is only caught at runtime as an HTTP 400.
  • Self-documenting surface: the signature would state exactly what each
    collection supports rather than "anything the service accepts."

The natural future step is to generate explicit params (with docstrings)
from the queryables snapshot, getting discoverability without hand-maintaining
~400 params. Until then, the passthrough unblocks the capability with minimal
surface area.

Verification

  • tests/waterdata_queryables_test.py — offline get_queryables parsing /
    error tests, offline passthrough tests (the filter reaches the /items
    request, lists comma-joined), and the 11-collection live monitor. All pass.
  • ruff check / ruff format / mypy --strict clean across the package.
  • Live sanity: a normal get_daily is unchanged by the **queryables addition;
    a passthrough state_name= filter is accepted by the service (no 400).

Before merge (once upstream is ready)

  • Regenerate the snapshot if queryables changed; confirm the held queryables now
    round-trip in the data.
  • Add a NEWS.md entry.
  • Decide passthrough vs. generated explicit params per the discussion above.

🤖 Generated with Claude Code

thodson-usgs and others added 2 commits June 24, 2026 11:15
Add `waterdata.get_queryables(collection)`, returning the OGC queryable
properties of a Water Data collection (`daily`, `continuous`,
`monitoring-locations`, ...) as a tidy `(DataFrame, BaseMetadata)` — one row per
filterable property with its type, title, and description.

Add `tests/waterdata_queryables_test.py`: offline parsing / error tests plus a
live monitor that compares each collection's advertised queryables against a
committed snapshot (`tests/data/waterdata_queryables.json`). The monitor fails
when the upstream API adds / removes / renames a queryable — the signal to
regenerate the snapshot and enable any new queryables on the matching getter.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Sjb14HkwuCydKSKMsaXsgd
The OGC data getters (`get_daily`, `get_continuous`, `get_peaks`, ...) exposed
~11 of each collection's ~50 queryables as named params; the rest — mostly the
shared monitoring-location attributes (`state_name`, `county_code`, `site_type`,
`altitude`, ...) now filterable on the data endpoints — were reachable only via
the raw `filter` CQL.

Accept any queryable as a passthrough kwarg: each OGC getter gains
`**queryables`, and the shared `_get_args` flattens it so an extra filter such
as `state_name="Wisconsin"` is normalized and sent exactly like a named param.
The service itself validates names (an unknown one returns HTTP 400 → typed
error), so no client-side queryable list is bundled.

The passthrough is provisional (see the PR description for the trade-off vs.
explicit per-property keyword arguments).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Sjb14HkwuCydKSKMsaXsgd
get_queryables duplicated the GET + _raise_for_non_200 sequence that
_check_ogc_requests already does for the identical "queryables" request
shape (it's req_type's own default), and hardcoded OGC_API_URL instead of
going through _ogc_base_url_var like every other URL-building call site in
engine.py. _check_ogc_requests now also returns the raw httpx.Response
(needed for BaseMetadata), so get_queryables can call it directly; the one
existing caller (shaping._deal_with_empty's schema lookup) is updated to
unpack the new tuple.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
thodson-usgs and others added 2 commits June 30, 2026 18:27
A ``**queryables`` parameter is always a dict (empty when the caller passes
no passthrough kwargs), never None — so the None sentinel and the
``if queryables:`` truthiness guard were unnecessary scaffolding. Collapse to
a single ``local_vars.update(local_vars.pop("queryables", {}))``; updating
with an empty dict is a harmless no-op, so the common (no-passthrough) path
behaves identically.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…check; correct the passthrough docstring

Two issues surfaced by an xhigh code review of the **queryables passthrough:

1. State mutual-exclusion guard bypass. `_with_state` runs apply_state's
   `reject=("state_code", "state_name")` check on `locals()` before the
   `**queryables` flatten happened (that lived in `_get_args`, which runs
   after). So on get_time_series_metadata — whose `state_code` is NOT an
   explicit parameter — `get_time_series_metadata(state="Wisconsin",
   state_code="55")` let `state_code` land in `**queryables`, slipped past the
   guard, then flattened back in, silently sending BOTH state_name and
   state_code (server 400) instead of raising the documented ValueError.
   Fix: extract `_flatten_queryables` and call it at the top of `_with_state`
   (before apply_state) as well as in `_get_args`. The flatten pops its key so
   it's idempotent; the second call is a no-op. Now robust for every
   _with_state getter regardless of which reject target is an explicit param.

2. Misleading docstring example. The shared `**queryables` docstring asserted
   `state_name`/`site_type_code` as the example queryables, but
   channel-measurements and field-measurements-metadata expose neither, so the
   copied example would 400 for those two getters. Reword (uniformly across all
   11) to present those as common-but-not-universal attributes and point to
   get_queryables for the collection's actual queryables.

Adds a unit test pinning the passthrough-via-queryables conflict case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread dataretrieval/waterdata/api.py Outdated
…yables

Per review feedback: drop the examples and per-collection caveats; just tell
users to call get_queryables to see a collection's queryables. Applied
uniformly across all 11 getters.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@thodson-usgs thodson-usgs marked this pull request as ready for review July 1, 2026 03:18
@thodson-usgs thodson-usgs merged commit d8f630e into DOI-USGS:main Jul 1, 2026
9 checks passed
@thodson-usgs thodson-usgs deleted the feat/waterdata-queryables branch July 1, 2026 03:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant